Calculating confidence intervals for prediction error in microarray classification using resampling.

نویسندگان

  • Wenyu Jiang
  • Sudhir Varma
  • Richard Simon
چکیده

Cross-validation based point estimates of prediction accuracy are frequently reported in microarray class prediction problems. However these point estimates can be highly variable, particularly for small sample numbers, and it would be useful to provide confidence intervals of prediction accuracy. We performed an extensive study of existing confidence interval methods and compared their performance in terms of empirical coverage and width. We developed a bootstrap case cross-validation (BCCV) resampling scheme and defined several confidence interval methods using BCCV with and without bias-correction. The widely used approach of basing confidence intervals on an independent binomial assumption of the leave-one-out cross-validation errors results in serious under-coverage of the true prediction error. Two split-sample based methods previously proposed in the literature tend to give overly conservative confidence intervals. Using BCCV resampling, the percentile confidence interval method was also found to be overly conservative without bias-correction, while the bias corrected accelerated (BCa) interval method of Efron returns substantially anti-conservative confidence intervals. We propose a simple bias reduction on the BCCV percentile interval. The method provides mildly conservative inference under all circumstances studied and outperforms the other methods in microarray applications with small to moderate sample sizes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum Separation Partial Least Squares (mspls): a New Method for Classification in Microarray Experiment

The purpose of the paper is to propose a new method for classification. Our MSPLS method was deduced from the classic Partial Least Squares (PLS) algorithm. In this method we applied the Maximum Separation Criterion. On the basis of the approach we are able to find such weight vectors that the dispersion between the classes is maximal and the dispersion within the classes is minimal. In order t...

متن کامل

Semiparametric Bootstrap Prediction Intervals in time Series

One of the main goals of studying the time series is estimation of prediction interval based on an observed sample path of the process. In recent years, different semiparametric bootstrap methods have been proposed to find the prediction intervals without any assumption of error distribution. In semiparametric bootstrap methods, a linear process is approximated by an autoregressive process. The...

متن کامل

A SAS Macro for Calculating Bootstrapped Confidence Intervals About a Kappa Coefficient

Cohen’s kappa coefficient has become a standard method for measuring the degree of agreement between two raters. Confidence intervals for kappa and weighted kappa based on its asymptotic variance are available in the SAS system through the FREQ procedure. However, this variance can become unreliable as sample size decreases or as kappa approaches unity. This paper presents a SAS macro for calcu...

متن کامل

Ideal bootstrap estimation of expected prediction error for k-nearest neighbor classifiers: Applications for classification and error assessment

Euclidean distance -nearest neighbor ( -NN) classifiers are simple nonparametric classification rules. 5 5 Bootstrap methods, widely used for estimating the expected prediction error of classification rules, are motivated by the objective of calculating the ideal bootstrap estimate of expected prediction error. In practice, bootstrap methods use Monte Carlo resampling to estimate the ideal boot...

متن کامل

Improving Classification Accuracy Assessments with Statistical Bootstrap Resampling Techniques

The use of remotely sensed imagery to generate land cover models is common today. Validation of these models typically involves the use of an independent set of ground-truth data which are used to calculate an error matrix resulting in estimates of omission, commission, and overall error. However, each estimate of error contains a degree of uncertainty itself due to 1) conceptual bias, 2) locat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical applications in genetics and molecular biology

دوره 7 1  شماره 

صفحات  -

تاریخ انتشار 2008